Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
自适应力矩估计(ADAM)优化器由于其快速收敛属性而广泛用于深度学习任务。但是,亚当的融合仍然不太了解。特别是,对亚当的现有分析不能清楚地证明亚当比SGD的优势。我们将这种理论上的尴尬归因于$ l $ -smooth的条件(即,假设梯度在全球lipschitz连续且常数$ l $)中被文献所采用,而文献经常指出,在实用的神经网络中经常失败。为了解决这一尴尬,我们分析了亚当在轻松的条件下的融合,称为$(l_0,l_1)$平滑度条件,这使梯度Lipschitz常数可以随地梯度规范而变化。 $(l_0,l_1)$严格弱于$ l $ -Smooth条件,并且已经过经验证明可以保留实用的深神经网络。在$(L_0,L_1)$平滑度条件下,我们为Adam建立了与实用的超参数的收敛性。具体而言,我们认为亚当可以适应局部平滑度条件,证明亚当的\ emph {Adpativity}是合理的。相反,在这种情况下,SGD可以任意放慢。我们的结果可能会阐明自适应梯度方法比非自适应方法的好处。
translated by 谷歌翻译
自Reddi等人以来。 2018年指出了亚当的分歧问题,已经设计了许多新变体以获得融合。但是,香草·亚当(Vanilla Adam)仍然非常受欢迎,并且在实践中效果很好。为什么理论和实践之间存在差距?我们指出,理论和实践的设置之间存在不匹配:Reddi等。 2018年选择亚当的超参数后选择问题,即$(\ beta_1,\ beta_2)$;虽然实际应用通常首先解决问题,然后调整$(\ beta_1,\ beta_2)$。由于这一观察,我们猜想只有当我们改变选择问题和超参数的顺序时,理论上的经验收敛才能是合理的。在这项工作中,我们确认了这一猜想。我们证明,当$ \ beta_2 $很大时,$ \ beta_1 <\ sqrt {\ beta_2} <1 $,Adam收集到关键点附近。邻居的大小是随机梯度方差的命题。在额外的条件(强烈生长条件)下,亚当收敛到关键点。随着$ \ beta_2 $的增加,我们的收敛结果可以覆盖[0,1)$中的任何$ \ beta_1 \,包括$ \ beta_1 = 0.9 $,这是深度学习库中的默认设置。我们的结果表明,亚当可以在广泛的超参数下收敛,而无需对其更新规则进行任何修改。据我们所知,我们是第一个证明这一结果的人,而没有强有力的假设,例如有限梯度。当$ \ beta_2 $很小时,我们进一步指出了一个$(\ beta_1,\ beta_2)$的大区域,亚当可以在其中偏离无限。我们的差异结果考虑与我们的收敛结果相同的设置,表明在增加$ \ beta_2 $时从差异到收敛的相变。这些正面和负面的结果可以提供有关如何调整亚当超级参数的建议。
translated by 谷歌翻译
Graph Machine Learning最近在学术界和行业中都引起了人们的关注。大多数图形机器学习模型,例如图形神经网络(GNN),都经过大量的图形数据训练。但是,在许多实际情况下,例如医疗保健系统中的住院预测,图形数据通常存储在多个数据所有者中,并且由于隐私问题和法规限制,任何其他方都无法直接访问。联合图机器学习(FGML)是一种有前途的解决方案,可以通过以联合方式训练图机学习模型来应对这一挑战。在这项调查中,我们对FGML文献进行了全面的综述。具体而言,我们首先提供了一种新的分类法,将FGML中的现有问题分为两个设置,即,\ emph {fl带有结构化数据}和\ emph {结构化的fl}。然后,我们回顾每种环境中的主流技术,并详细介绍它们如何应对FGML下的挑战。此外,我们总结了来自不同域中FGML的现实应用程序,并介绍FGML中采用的开放图数据集和平台。最后,我们在现有研究中提出了一些局限性,并在该领域的研究方向有前途的方向。
translated by 谷歌翻译
图形神经网络(GNN)表现出令人满意的各种图分析问题的性能。因此,在各种决策方案中,它们已成为\ emph {de exto}解决方案。但是,GNN可以针对某些人口亚组产生偏差的结果。最近的一些作品在经验上表明,输入网络的偏见结构是GNN的重要来源。然而,没有系统仔细检查输入网络结构的哪一部分会导致对任何给定节点的偏见预测。对输入网络的结构如何影响GNN结果的偏见的透明度很大,在很大程度上限制了在各种决策方案中的安全采用GNN。在本文中,我们研究了GNN中偏见的结构解释的新研究问题。具体而言,我们提出了一个新颖的事后解释框架,以识别可以最大程度地解释出偏见的两个边缘集,并最大程度地促进任何给定节点的GNN预测的公平水平。这种解释不仅提供了对GNN预测的偏见/公平性的全面理解,而且在建立有效但公平的GNN模型方面具有实际意义。对现实世界数据集的广泛实验验证了拟议框架在为GNN偏见提供有效的结构解释方面的有效性。可以在https://github.com/yushundong/referee上找到开源代码。
translated by 谷歌翻译
图神经网络(GNN)在图形上学习节点表示方面表现出很大的力量。但是,他们可能会从训练数据中继承历史偏见,从而导致预测的歧视性偏见。尽管某些工作已经开发出公平的GNN,但其中大多数直接从非图形域借用了公平代表性学习技术,而没有考虑GNN中特征传播引起的敏感属性泄漏的潜在问题。但是,我们从经验上观察到,特征传播可能会改变以前无害特征与敏感特征的相关性。这可以看作是敏感信息的泄漏,可以进一步加剧预测中的歧视。因此,我们根据特征相关性设计了两个特征掩盖策略,以突出考虑特征传播和相关性变化在减轻歧视中的重要性。通过我们的分析,我们提出了公平视图图神经网络(FAIRVGNN),以通过自动识别和掩盖敏感的相关特征来生成特征的公平视图,以考虑特征传播后的相关变化。鉴于博学的公平视图,我们适应编码器的夹紧权重,以避免使用敏感相关的功能。现实世界数据集的实验表明,Fairvgnn在模型实用程序和公平性之间取得了更好的权衡。我们的代码可在https://github.com/yuwvandy/fairvgnn上公开获取。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译